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Abstract 

In this paper, we investigate the minimax properties of Stein block thresholding in 
any dimension d with a particular emphasis on d = 2. Towards this goal, we consider 
a frame coefficient space over which minimaxity is proved. The choice of this space 
is inspired by the characterization provided in [4] of family of smoothness spaces on 
M*^, a subclass of so-called decomposition spaces [23]. These smoothness spaces cover 
the classical case of Besov spaces, as well as smoothness spaces corresponding to 
curvelet-type constructions. Our main theoretical result investigates the minimax 
rates over these decomposition spaces, and shows that our block estimator can 
achieve the optimal minimax rate, or is at least nearly-minimax (up to a log factor) 
in the least favorable situation. Another contribution is that the minimax rates 
given here are stated for a general noise sequence model in the transform coefficient 
domain beyond the usual i.i.d. Gaussian case. The choice of the threshold parameter 
is theoretically discussed and its optimal value is stated for some noise models 
such as the (non-necessarily i.i.d.) Gaussian case. We provide a simple, fast and a 
practical procedure. We also report a comprehensive simulation study to support 
our theoretical findings. The practical performance of our Stein block denoising 
compares very favorably to the BLS-GSM state-of-the art denoising algorithm on a 
large set of test images. A toolbox is made available for download on the Internet 
to reproduce the results discussed in this paper. 
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1 Introduction 



Consider tlie nonparametric regression model: 

Fi = /(i/n) + ei, ie{l,...,nY, (1.1) 

where d G N* is the dimension of the data, (Fi)iG{i,. are the observations 
regularly sampled on a rf-dimensional Cartesian grid, (ei)ig{i,...,n}<i are inde- 
pendent and identically distributed (i.i.d.) Af{0, 1), and / : [0, l]'^ — > R is an 
unknown function. The goal is to estimate / from the observations. We want 
to build an adaptive estimator / (i.e. its construction depends on the obser- 
vations only) such that the mean integrated squared error (MISE) defined by 

/) = ^ /[o,i]'^ (/(^) ~ /(^)) is as small as possible for a wide class 
of /. A now classical approach to the study of nonparametric problems of the 
form (1.1) is to, first, transform the data to obtain a sequence of coefficients, 
second, analyze and process the coefficients (e.g. shrinkage, thresholding), and 
finally, reconstruct the estimate from the processed coefficients. This approach 
has already proven to be very successful by several authors and a good survey 



may be found in |28|, |29|, |30|. In particular, it is now well established that the 
quality of the estimation is closely linked to the sparsity of the sequence of 
coefficients representing / in the transform domain. Therefore, in this paper, 
we focus our attention on transform-domain shrinkage methods, such as those 
operating in the wavelet domain. 



1.1 The one- dimensional case 



First of all, let's consider the one-dimensional case d = 1. The most standard 



of wavelet shrinkage methods is VisuShrink of [22j. It is constructed through 



individual (or term-by-term) thresholding of the empirical wavelet coefficients. 
It enjoys good theoretical (and practical) properties. In particular, it achieves 
the optimal rate of convergence up to a logarithmic term over the Holder class 
under the MISE. In other words, if / denotes VisuShrink, and A*(M) the 
Holder smoothness class, then there exists a constant C > such that 



sup R{f 

/eA=(Af) 



:i.2) 



Other term-by-term shrinkage rules have been developed. See, for instance, the 
firm shrinkage jisi or the non-negative garrote shrinkage 24|. In particular, 
they satisfy (1.2) but improve the value of the constant C. An exhaustive 
account of other shrinkage methods is provided in [sl that the interested reader 
may refer to. 
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The individual approach achieves a degree of trade-off between variance and 
bias contribution to the MISE. However, this trade-off is not optimal; it re- 
moves too many terms from the observed wavelet expansion, with the conse- 
quence the estimator is too biased and has a sub-optimal MISE convergence 
rate (and also in other Lp metrics 1 < p < oo). One way to increase estima- 
tion precision is by exploiting information about neighboring coefficients. In 
other words, empirical wavelet coefficients tend to form clusters that could be 
thresholded in blocks (or groups) rather than individually. This would allow 
threshold decisions to be made more accurately and permit convergence rates 



to be improved. Such a procedure has been introduced in [26|, |27[ who studied 
wavelet shrinkage methods based on block thresholding. The procedure first 
divides the wavelet coefficients at each resolution level into non-overlapping 
blocks and then keeps all the coefficients within a block if, and only if, the 
magnitude of the sum of the squared empirical coefficients within that block 



is greater than a fixed threshold. The original procedure developed by [26|, |27 
is defined with the block size (logn)^. BlockShrink of 0, @] is the optimal 
version of this procedure. It uses a different block size, logn, and enjoys a 
number of advantages over the conventional individual thresholding. In par- 
ticular, it achieves the optimal rate of convergence over the Holder class under 
the MISE. In other words, if denotes the BlockShrink estimate, then there 
exists a constant C > such that 

sup i?(/^,/) < Cn-2^/(i+2^). (1.3) 
/ga»(m) 

Clearly, in comparison to VisuShrink, BlockShrink removes the extra loga- 
rithmic term. The minimax properties of BlockShrink under the Lp risk have 
been studied in i^]. Other local block thresholding rules have been devel- 



oped. Among them, there is BlockJS of 0, @] which combines James-Stein 
rule (see j40|) with the wavelet methodology. In particular, it satisfies (1.3) 
but improves the value of the constant C. From a practical point view, it is 
better than BlockShrink. Further details about the theoretical performances 



of BlockJS can be found in [17[. We refer to [3| and [9| for a comprehensive 



simulation study. Variations of BlockJS are BlockSure of [2l[ and SureBlock 
of M. The distinctive aspect of these block thresholding procedures is to 
provide data-driven algorithms to chose the threshold parameter. Let's also 
mention the work of p| who considered wavelet block denoising in a Bayesian 
framework to obtain level-dependent block shrinkage and thresholding esti- 
mates. 



1.2 The multi-dimensional case 



Denoising is a long-standing problem in image processing. Since the seminal 



papers by Donoho & Johnstone [22|, the image processing literature has been 
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inundated by hundreds of papers applying or proposing modifications of the 
original algorithm in image denoising. Owing to recent advances in compu- 
tational harmonic analysis, many multi-scale geometrical transforms, such as 



ridgelets [16|, curvelets or bandlets [36j, were shown to be very ef- 

fective in sparsely representing the geometrical content in images. Thanks to 
the sparsity (or more precisely compressibility) property of these expansions, 
it is reasonable to assume that essentially only a few large coefficients will 
contain information about the underlying image, while small values can be at- 
tributed to the noise which uniformly contaminates all transform coefficients. 
Thus, the wavelet thresholding/ shrinkage procedure can be mimicked for these 
transforms, even though some care should be taken when the transform is re- 
dundant (corresponding to a frame or a tight frame). The modus operandi is 
again the same, first apply the transform, then perform a non-linear operator 
on the coefficients (each coefficient individually or in group of coefficients), 
and finally apply the inverse transform to get an image estimate. Among the 
man y tr ansform-domain image denoising algorithms to date, we would like to 



cite |38l. I39l . 1371 . |33| which are amongst the most efficient in the literature. Ex- 
cept 33|, all cited approaches use orthodox Bayesian machinery and assume 
different forms of multivariate priors over blocks of neighboring coefficients 
and even interscale dependency. Nonetheless, none of those papers provide a 
study of the theoretical performance of the estimators. 



From a theoretical point of view, Candes [12j has shown that the ridgelet- 
based individual coefficient thresholding estimator is nearly minimax for re- 
covering piecewise smooth images away from discontinuities along lines. In- 
dividual thresholding of curvelet tight frame coefficients yields an estimator 
that achieves a nearly-optimal minimax rate 0(n~^/^)[l] (up to logarithmic 
factor) uniformly over the class of piecewise images away from singulari- 
ties along curves — so-called C^-C^ images (islpl. Similarly, Le Pennec et 
al. jsij have recently proved that individual thresholding in an adaptively se- 
lected best bandlet orthobasis is nearly-minimax for C° functions away from 
edges. 

In the image processing community, block thresholding/shrinkage in a non- 



Bayesian framework has been used very little. In [18|, |19| the authors propose a 
multi-channel block denoising algorithm in the wavelet domain. The hyperpa- 
rameters associated to their method (e.g. threshold), are derived using Stein's 



risk estimator. Yu et al. [4l|| advocated the use of BlockJS [7|] to denoise au- 
dio signal in the time-frequency domain with anisotropic block size. To the 
best of our knowledge, no theoretical study of the minimax properties of block 
thresholding/ shrinkage for images, and more generally for multi-dimensional 
data, has been reported in the literature. 



^ It is supposed that the image has size n x n. 
^ Known as the cartoon model. 
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1.3 Contributions 



In this paper, we propose a generalization of Stein block thresholding to any 
dimension d. We investigate its minimax properties with a particular empha- 
sis on d = 2. Towards this goal, we consider a frame coefficient space over 
which minimaxity is proved; see (3.2). The choice of this space is inspired by 
the characterization provided in [4| of family of smoothness spaces on M*^, a 
subclass of so-called decomposition spaces [J, l23|. We will elaborate more on 
these (sparsity) smoothness spaces later in subsection 3.2. From this charac- 
terization, it turns out that our frame coefficient spaces are closely related 
to smoothness spaces that cover the classical case of Besov spaces, as well as 
smoothness spaces corresponding to curvelet-type constructions mW^, d>2. 
Therefore, for d = 2 our denoiser will apply to both images with smoothness in 
Besov spaces for which wavelets are known to provide a sparse representation, 
and also to images that are compressible in the curvelet domain. 



Our main theoretical result investigates the minimax rates over these decom- 
position spaces, and shows that our block estimator can achieve the optimal 
minimax rate, or is at least nearly-minimax (up to a log factor) in the least 
favorable situation. Another novelty is that the minimax rates given here are 
stated for a general noise sequence model in the transform coefficient domain 
beyond the usual i.i.d. Gaussian case. Thus, our result is particularly useful 
when the transform used corresponds to a frame, where a bounded zero-mean 
white Gaussian noise in the original domain is transformed into a bounded 
zero-mean correlated Gaussian process with a covariance matrix given by the 
Gram matrix of the frame. 



The choice of the threshold parameter is theoretically discussed and its opti- 
mal value is stated for some noise models such as the (non-necessarily i.i.d.) 
Gaussian case. We provide a simple, fast and a practical procedure. We report 
a comprehensive simulation study to support our theoretical findings. It turns 
out that the only two parameters of our Stein block denoiser — the block size 
and the threshold — dictated by the theory work well for a large set of test im- 
ages and various transforms. Moreover, the practical performance of our Stein 
block denoising compares very favorably to state-of-the art methods such as 
the BLS-GSM of 38|. Our procedure is however much simpler to implement 
and has a much lower computational cost than orthodox Bayesian methods 
such as BLS-GSM, since it does not involve any computationally consuming 
integration nor optimization steps. A toolbox is made available for download 
on the Internet to reproduce the results discussed in this paper. 
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1-4 Organization of the paper 



The paper is organized as follows. Section 2 is devoted to the one-dimensional 
BlockJS procedure introduced in [3]. In Section 3, we extend BlockJS to the 
multi-dimensional case and a fairly general noise model beyond the i.i.d. Gaus- 
sian case. This section also contains our main theoretical results. In Section 
4, a comprehensive experimental study is reported and discussed. We finally 
conclude in Section 5 and point to some perspectives. The proofs of the results 
are deferred to the appendix awaiting inspection by the interested reader. 



2 The one-dimensional BlockJS 



In this section, we present the construction and the theoretical performance 
of the one- dimensional BlockJS procedure developed by [3]. 

Consider the one- dimensional nonparametric regression model: 

Yi = f{i/n) + ei, z = l,...,n, (2.1) 

where (li)i=i,...,n are the observations, (ej)j=i,...,„ are i.i.d. A/'(0, 1), and / : 
[0, 1] — M is an unknown function. The goal is to estimate / from the obser- 
vations. In the orthogonal wavelet framework, (2.1) amounts to the sequence 
model 

yj,k = Oj,k + n-^^^z,,ki j = 0,...,J, fc = 0,...,2^-l, (2.2) 
where J = [log2r;,J, {yj^k)j,k are the observations, for each j, {zj^k)k are i.i.d. 
M{0, 1), and {6j^k)j,k are approximately the true wavelet coefficients of /. Since 
they determine completely /, the goal is to estimate these coefficients as ac- 
curately as possible. To assess the performance of an estimator 6 = {6j^k)j,k of 
^ — i^j,k)j,k, we adopt the minimax approach under the expected squared 
error over a given Besov body. The expected squared error is defined by 
R{0, e) = EJLo T^kJo^ E ((^j- fc - fc)2) , and the Besov body by 

In this notation, s > is a smoothness parameter, < p < +oo and < g < 
-|-oo are norm parameter^, and M G (0, oo) denotes the radius of the ball. 
The Besov body contains a wide class of 6* = {6j^k)j,k- It includes the Holder 
body ei^^{M) and the Sobolev body Ol^iM). 



^ This is a slight abuse of terminology as for < p, q < 1, Besov spaces are rather 
complete quasinormed linear spaces. 
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The goal of the minimax approach is to construct an adaptive estimator 6 = 
{Gj,k)j,k such that supgggs R{6 , 6) is as small as possible. A candidate is 
the BlockJS procedure wfiose paradigm is described below. 

Let L = [log n\ be the block size, jo = [^og2 L\ the coarsest decomposition 
scale and, for any j, Aj = {0, 2^ — 1} is the set of locations at scale j. For 
any j E {jo, J}, let Aj = {1, [2-'L~^J } be the set of block indices at scale 
j, and for any K G Aj, Uj^x = {k E Aj] {K - 1)L < k < KL - 1} is the 
set indexing the locations of coefficients within the i^"th block. Let A,, be a 
threshold parameter chosen as the root of a; — logo; = 3 (i.e. A* = 4.50524...). 

j,k)j,k where, for any k G Uj^x and 

if J G {0,...,jo-l}, 

if J G {jo,-,J}, (2.3) 

ifjGN-{0,...,J}. 

where (x)^ = max(a;,0). Thus, at the coarsest scales j G {0,...,jo}, the ob- 
served coefficients (l/j.fc)^^^. are left intact as usual. For k G Aj and j G N — 
{0, J}, Oj^k is estimated by zero. For k G Uj^K^ K G Aj and j G {jo, J}, 
if the mean energy within the i^'th block I]fce(7j k y],kl-^ larger than X^n~^ 
then Uj^k is shrunk by the amount Z/j.fc j.v^'^*" — otherwise, Oj^k is estimated 

T^lk u y'^k 

by zero. Note that ^Ji'^ can be interpreted as a local measure of signal- 
to-noise ratio in the block Vj^K- Such a block thresholding originates from the 
James-Stein rule introduced in [40,]. 

The block length L = [lognj and the value A,, = 4.50524 are chosen based 
on theoretical considerations; under this calibration, the BlockJS is (near) 
optimal in terms of minimax rate and adaptivity. This is summarized in the 
following theorem. 

Theorem 2.1 ((t*!) Consider the model (2.2) for n large enough. Let 6* he 
given as (2.3). Then there exists a constant C > such that 



Now estimate 6 
K G Aj, 



''j,k)j,k 



by 0* 



^hk 




eeeZM)^^^*'^^ " ^ 1 n-'^/^'^^'\\ognY'-^^/^^^'^~''^\ for p < 2, sp 



The rates of convergence (2.4) are optimal, except in the case p < 2 where 
there is an extra logarithmic term. They are better than those achieved by 
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standard individual thresholding (hard, soft, non-negative garotte, etc); we 
gain a logarithmic factor for p > 2. See (22| . 



3 The multi-dimensional BlockJS 

This section is the core of our proposal where we introduce a BlockJS-type 
procedure for multi-dimensional data. The goal is to adapt its construction 
in such a way that it preserves its optimal properties over a wide class of 
functions. 



3.1 The sequence model 

Our approach begins by projecting the model (1.1) onto a collection of atoms 
(v^i,^,k)j £ k ^^^^ forms a (tight) frame. This gives rise to a sequence space model 
obtained by calculating the noisy coefficient yj/^k = {Y, V^j,£,k) for any element 
of the frame ^Pj/,k- We then have a mult i- dimensional sequence of coefficients 
{yjA^)j/M defined by 

Vj/M = Oj,e,k + n~''^'^Zj/^k, j = 0, J, i e Bj, k G Dj, (3.1) 

where J = [logsnj, r G [l,d], d E W, Bj = {1, [c,2^^J}, c, > 1, t; G [0, 1], 
k = (ki, kii), Dj = nf=i{0, [2'^'-'] —1}, (/ij)j=i,...,d is a sequence of positive 
real numbers, (-Zj/,k)j,£.k are random variables and {0j/.k)j/^k are unknown 
coefficients. Let d^ = Y.i=i 

The indices j and k are respectively the scale and position parameters. ^ is 
a generic integer indexing for example the orientation (subband) which may 
be scale-dependent. The parameters (/ij)j=i,...,d allow to handle anisotropic 
subbands. To illustrate the meaning of these parameters, let's see how they 
specialize in some popular transforms. For example, with the separable two- 
dimensional wavelet transform, we have f = 0, c* = 3, and /ii = /i2 = 1- Thus, 
as expected, we get three isotropic subbands at each scale. For the second 
generation curvelet transform [13], we have v = 1/2, /ii = 1 and fi2 = 1/2 
which corresponds to the parabolic scaling of curvelets. 

3.1.1 Assumptions on the noise sequence 

Let L = L(r logn)^/'^J be the block length, jo = [(1/ minj=i^.,,^rf/ij) log2 LJ is 
the coarsest decomposition scale, and = [(r/ {d^^ + S + v)) log2 n\ . For any 
j G {jo, J*}, let 
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• -^j = nf=i{l) [2^^^L~^\} be the set indexing the blocks at scale j. 

• For each block index K = {Ki, Kd) G Aj, t/,- k = {k G Dj; {Ki - l)L < 
ki < KiL — 1,..., {Kd — 1)L < kd < KdL — 1} is the set indexing the 
positions of coefficients within the Kth block t/j,K- 

Our assumptions on the noise model are as follows. Suppose that there exist 
5 > 0, A* > 0, Qi > and Q2 > independent of n such that 

(Al) sup,e|o,...,j}Sup,eij,2-^(^*+^)Ekei?,E(4,k) < Q^. 
(A2) 

E E E E E Ue,^U < Q2. 

3=30 l&B, ^eA, ke{/,,K V l^''ei/,,K ^..^,k>-^-2 /4|y 

Assumptions (Al) and (A2) are satisfied for a wide class of noise models on the 
sequence (2;j,£,k)j,£,k (not necessarily independent or identically distributed). 
Several such noise models are characterized in Propositions 3.1 and 3.2 below. 

Remark 3.1 {Comments on 5) The parameter 5 is connected to the nature 
of the model. For standard models, and in particular, the d- dimensional non- 
parametric regression corresponding to the problem of denoising (see Section 
4), S is set to zero. The presence of 6 in our assumptions, definitions and 
results is motivated by potential applicability of the multi- dimensional BlockJS 
(to be defined in Subsection 3.3) to other inverse problems such as deconvolu- 
tion. The role of 6 becomes of interest when addressing such inverse problems. 
This will be the focus of a future work. To illustrate the importance of 6 in 
one- dimensional deconvolution, see t3ij . 



3.2 The smoothness space 



We wish to estimate (6'j,^,k)i,^,k from (?/j,^,k)j,£,k defined by (3.1). To measure the 
performance of an estimator 6 = (6'j,^,k)j,^,k of 6* = (6'j,^,k)i,^,k, we consider the 
minimax approach under the expected multi-dimensional squared error over 
a multi- dimensional frame coefficient space. The expected multi-dimensional 
squared error is defined by 



^(^.^) = E E E E ((^,,^,k - ^,,.,k) 



and the multi-dimensional frame coefficient smoothness/sparseness space by 



i\ 1/9 



0L(M) = 9 



7j,e,'k)j,e,k, 



E E 2^-(^+'^-/-^*/-) E 



V 



kkeD, 




< M 
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with a smoothness parameter s, < p < +00 and < g < +00. We recall 
that = X/i=i l^i- 

The definition of these smoothness spaces is motivated by the work of 
These authors studied decomposition spaces associated to appropriate struc- 
tured uniform partition of the unity in the frequency space R"'. They consid- 
ered construction of tight frames adapted to form atomic decomposition of the 
associated decomposition spaces, and established norm equivalence between 
these smoothness/sparseness spaces and the sequence norm defined in (3.2). 
That is, the decomposition space norm can be completely characterized by 
the sparsity or decay behavior of the associated frame coefficients. 

For example, in the case of a "uniform" dyadic partition of the unity, the 
smoothness/sparseness space is a Besov space -Bp^g, for which suitable wavelet 



expansion ^ I is known to provide a sparse representation 3J| . In this case, from 
subsection 3.1 we have d* = d, and ©^^(M) is a (i-dimensional Besov ball. 

Curvelets in arbitrary dimensions correspond to partitioning the frequency 
plane into dyadic coronae, which are then angularly localized near regions 
of side length 2^ in the radial direction and 2-'/^ in all the other directions 



11[. For d = 2, the angular wedges obey the parabolic scaling law [14!]. This 
partition of the frequency plane is significantly different from dyadic decom- 
positions, and as a consequence, sparseness for curvelet expansions cannot 
be described in terms of classical smoothness spaces. For d = 2, Borup and 
Nielsen 0, Lemma 10] showed that the smoothness/sparseness space (3.2) 
and the smoothness/sparseness of the second-generation curvelets jl3l] are the 
same, in which case d* = 3/2. Embedding results for curvelet-type decompo- 
sition spaces relative to Besov spaces were also provided in [3]. Furthermore, 
it was shown that piecewise images away from piecewise-C^ singularities. 



which are sparsely represented in the curvelet tight frame [ij], are contained 
in ©2/3'2/3' > 0- Even though the role and the range of (3 has not been 
clarified by the authors in 



3.3 Multi- dimensional block estimator 



As for the one- dimensional case, we wish to construct an adaptive estimator 
^ — (^i,^,k)j,^,k such that supgg@s ^(j^f) (^5 ^) is as small as possible. To reach 
this goal, we propose a multi-cfimensional version of the BlockJS procedure 
introduced in 0]. 

From subsection 3.1.1, recall the definitions of L, Jq, J^, Aj and f/j-,K- We 

With a wavelet having sufficient regularity and number of vanishing moments 
341. 
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estimate 9 = (dj^e,k)j,eM by 0* = {^j,e,k)j,eM where, for any k e Uj^k, K e Aj 
and £ e Bj, 



n* 



%,Ak, if i e {0, ...,io - 1}, 



y,,^,k|i- 1 I , ifje{jo,...,J*}, (3.3) 

0, if J eN-{0,..., J.}. 



In this definition, 6 and A* denote the constants involved in (Al) and (A2). 
Again, the coarsest scale coefficients are left unaltered, while the other coeffi- 
cients are either thresholded or shrunk depending whether the local measure 

of signal-to-noise ratio ''^-r'^ ^' ' within the block Uj^k is larger that the 

threshold X^2^^ . Notice that the dimension d of the model appears in the def- 
inition of L, the length of each block C/j,k- This point is crucial; L optimizes 
the theoretical and practical performance of the considered multi-dimensional 
BlockJS procedure. As far as the choice of the threshold parameter A* is con- 
cerned, it will be discussed in Subsection 3.5 below. 



3.4 Minimax theorem 



Theorem 3.1 below investigates the minimax rate of (3.3) over &p^q- 

Theorem 3.1 Consider the model (3.1) forn large enough. Suppose that (Al) 
and (A2) are satisfied. Let 6* he given as in (3.3). 

• There exists a constant C > such that 



sup R(e\e) < Cpn, 



where 



^-2sr/{2s+s+d.+v)^ forq<2<p, 
^" ~ ^ {logn/nf''/^^'+^+'^*+''\ forq<p<2,sp>d,y (1 -p/2){d + d, +%^^ 

• Ifv — 0, the minimax rates (3.4) hold without the restriction q < p A 2. 

The rates of convergence (3.4) are optimal for a wide class of variables (2;j,^,k)j,^,k- 
If we take d^, = d = fii = l,r = l,c^, = l and v = S = 0, then we recover 
the rates exhibited in the one-dimensional wavelet case expressed in Theorem 
2.1. There is only a minor difference on the power of the logarithmic term for 
p < 2. Thus, Theorem 3.1 can be viewed as a generahzation of Theorem 2.1. 
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In the case of (i-dimensional isotropic Besov spaces, where wavelets (corre- 
sponding to V = 0, fj,i = fi2 = I and then = d) provide optimally sparse 
representations, Theorem 3.1 can be applied without the restriction q < pA2. 
Therefore, for p > 2, Theorem 3.1 states that Stein block thresholding gets 
rid of the logarithmic factor, hence achieving the optimal minimax rate over 
those Besov spaces. For p < 2, the block estimator is nearly- minimax. 

As far as curvelet-type decomposition spaces are concerned, from section 3.1 
we have /xi = 1, /i2 = |, c^* = + = |, r = ci = 2, = i, 5 = 0. This gives 
the rates 



n-^'/^'+^\ forg<2<p, 

Pn = 'i 3 

(logn/n)^"/('+^), for q < p < 2, sp > -y {2 - p). 

where the logarithmic factor disappears only for q < 2 < p. Following the 
discussion of section 3.2, C^-C"^ images correspond to a smoothness space 
®p,g with p = q = 2/3. Moreover, 3k > such that taking s = 2 + k satisfies 
the condition of Theorem 3.1, and C^-C^ images are contained in ©2/3 2/3 with 
such a choice. We then arrive at the rate 0(n~^^^) (ignoring the logarithmic 
factor). This is consistent with the results of ^], which established that no 
estimator can achieve a better rate than the optimal minimax rate 0{n~'^^^) 
uniformly over the C'^-C^ class. On the other hand, individual thresholding 
in the curvelet tight frame has also the nearly-minimax rate 0{n~^^^) [l5| 
uniformly over the class of C'^-C'^ images. Nonetheless, the experimental results 
reported in this paper indicate that block curvelet thresholding outperforms 
in practice term-by-term thresholding on a wide variety of images, although 
the improvement can be of a limited extent. 



3.5 On the (theoretical) choice of the threshold 



To apply Theorem 3.1, it is enough to determine 6 and A^. such that (Al) and 
(A2) are satisfied. The parameter 6 is imposed by the nature of the model; 
it can be easily fixed as in our denoising experiments where it was set to 
6 = 0. The choice of the threshold A* is more involved. This choice is crucial 
towards good performance of the estimator 6*. From a theoretical point of 
view, since the constant C of the bound (3.4) increases with growing A, the 
optimal threshold is the smallest real number A,,, such that (A2) is fulfilled. In 
the following, we first provide the explicit expression of A* in the situation of 
a non-necessarily i.i.d. Gaussian noise sequence (2;j/,k)j,^,k- This result is then 
refined in the case of a white Gaussian noise. 

Proposition 3.1 below determines a suitable threshold A* satisfying (Al) and 
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(A2) when (2;j,^,k)j,^,k are Gaussian random variables (not necessarily i.i.d.). 



Proposition 3.1 Consider the model (3.1) forn large enough. Suppose that, 
for any j G {0, J} and any i G Bj, {zj^e^k)^ is a centered Gaussian process. 
Assume that there exists two constants Qs > and Q4 > (independent of 
n) such that 

• (A3).- sup^g|o,...,j} sup^gB, supkez)^ 2-"^^^ (z^^^^) < Qs- 

• (A4).- for any a = (ak)keD, such that supj.g|o,...,J} supKe^ SkG{/,,K '^k < 1. 
we have 



sup sup sup 2 '^■'E 
je{o,...,j}£eBj KeAj 



Then (Al) and (A2) are satisfied with A, = 4 ((2Q4)^/^ + Q3^^)^ Therefore 
Theorem 3.1 can he applied to 9* defined by (3.3) with such a A*. 



This result is useful as it establishes that the block denoising procedure and 
the minimax rates of Theorem 2.1 apply to the case of frames where a bounded 
zero-mean white Gaussian noise in the original domain is transformed into a 
bounded zero-mean correlated Gaussian process. 

If additional information is considered on (-2j,^,k)j,^,k, the threshold constant A=k 
defined in Proposition 3.1 can be improved. This is the case when (;i;j,£,k)j,£,k 
are i.i.d. A/'(0, 1) as is the case if the transform were orthogonal (e.g. orthogonal 
wavelet transform). The statement is made formal in the following proposition. 



Proposition 3.2 Consider the model (3.1) forn large enough. Suppose that, 
for any j G {0, J} and any i G Bj, (-2j/,k)k o^e i.i.d. Af{0, 1) as is the case 
when the transform used corresponds to an orthobasis. Theorem 3.1 can be 
applied with the estimator 9* defined by (3.3) with 6 = and A* the root of 
a; - log X = 3, i.e. K = 4.50524... . 



The optimal threshold^onstant A* described in Proposition 3.2 corresponds 
to the one isolated by 
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4 Application to image block denoising 

4-1 Impact of threshold and block size 

In this first experiment, tlie goal is twofold: first assess the impact of the 
threshold and the block size on the performance of block denoising, and second 
investigate the validity of their choice as prescribed by the theory. For an xn 
image / and its estimate /, the denoising performance is measured in terms 
of peak signal-to-noise ratio (PSNR) in decibels (dB) 

PSNR = 201ogio dB . 

II/-/II2 

In this experiment, as well as in the rest of paper, three popular transforms 
are used: the orthogonal wavelet transform (DWT), its translation invariant 
version (UDWT) and the second generation fast discrete curvelet transform 
(FDCT) with the wrapping implementation [l^. The Symmlet wavelet with 6 
vanishing moments was used throughout all experiments. For each transform, 
two images were tested Barbara (512 x 512) and Peppers (256 x 256), and each 
image, was contaminated with zero-mean white Gaussian noise with increasing 
standard deviation a G {5,10,15,20,25,30}, corresponding to input PSNR 
values {34.15, 28.13, 24.61, 22.11, 20.17, 18.59, 14.15} dB. At each combination 
of test image and noise level, ten noisy versions were generated. Then, block 
denoising was ten applied to each of the ten noisy images for each block 
size L e {1,2,4,8,16} and threshold A G {2,3,4,4.5,5,6}, and the average 
output PSNR over the ten realizations was computed. This yields one plot of 
average output PSNR as a function of A and L at each combination (image- 
noise level-transform). The results are depicted in Fig.l, Fig. 2 and Fig. 3 for 
respectively the DWT, UDWT and FDCT. One can see that the maximum 
of PSNR occurs at L = 4 (for A > 3) whatever the transform and image, and 
this value turns to be the choice dictated by the theoretical procedure. As far 
as the influence of A is concerned, the PSNR attains its exact highest peak at 
different values of A depending on the image, transform and noise level. For 
the DWT, this maximum PSNR takes place near the theoretical threshold 
A* ~ 4.5 as expected from Proposition 3.2. Even with the other redundant 
transforms, that correspond to tight frames for which Proposition 3.2 is not 
rigorously valid, a sort of plateau is reached near A = 4.5. Only a minor 
improvement can be gained by taking a higher threshold A; see e.g. Fig. 2 or 
3 with Peppers for a > 20. Note that this improvement by taking a higher A 
for redundant transforms (i.e. non i.i.d. Gaussian noise) is formally predicted 
by Proposition 3.1. Even though the estimate of Proposition 3.1 was expected 
to be rather crude. To summarize, the value 4.50524... intended to work for 
orthobases seems to yield good results also with redundant transforms. 
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Barbara 512 x 512 
0=5 PSNR=34.15 db o=10 PSNR=28.13 db a=15 PSNR=24.61 db 




Fig. 1. Output PSNR as a function of the block size and the threshold A at different 
noise levels a G {5, 10, 15, 20, 25, 30}. Block denoising was applied in the DWT 
domain. 

4.2 Comparative study 

Block vs term-by-term It is instructive to quantify the improvement 

brought by block denoising compared to term-by-tcrm thresholding. For reli- 
able comparison, we applied the denoising algorithms to six standard grayscale 
images with different contents of size 512 x 512 (Barbara, Lena, Boat and Fin- 
gerprint) and 256 x 256 (House and Peppers). All images were normalized 
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Barbara 512 x 512 
0=5 PSNR=34.15 db c=10 PSNR=28.13 db a=15 PSNR=24.61 db 




Fig. 2. Output PSNR as a function of the block size and the threshold A at different 
noise levels a G {5,10,15,20,25,30}. Block denoising was applied in the UDWT 
domain. 

to a maximum grayscale value 255. The images were corrupted by a zero- 
mean white Gaussian noise with standard deviation a G {5, 10, 15, 20, 25, 30}. 
The output PSNR was averaged over ten reahzations, and all algorithms were 
applied to the same noisy versions. The threshold used with individual thresh- 
olding was set to the classical value 3(J for the (orthogonal) DWT, and So" for 
all scales and 4o" at the finest scale for the (redundant) UDWT and FDCT. The 
results are displayed in Fig.4. Each plot corresponds to PSNR improvement 
over DWT term-by-term thresholding as a function of cr. To summarize. 
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o=5 PSNR=34.15db 




Barbara 512 x 512 
0=10 PSNR=28.13db 




Peppers 256 x 256 
0=10 PSNR=28.13db 




0=15 PSNR=24.61 db 




Fig. 3. Output PSNR as a function of the block size and the threshold A at different 
noise levels a € {5,10,15,20,25,30}. Block denoising was applied in the FDCT 
domain. 

• Block shrinkage improves the denoising results in general compared to in- 
dividual thresholding. Even though the improvement extent decreases with 
increasing cr. The PSNR increase brought by block denoising with a given 
transform compared to individual thresholding with the same transform can 
be up to 2.55 dB. 

• Owing to block shrinkage, even the orthogonal DWT becomes competitive 
with redundant transforms. For Barbara, block denoising with DWT is even 
better than individual thresholding in the translation-invariant UDWT. 
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• For some images (e.g. Peppers or House), block denoising with curvelets can 
be slightly outperformed by its term-by-term thresholding counterpart for 
a = 50. 

• As expected, no transform is the best for all images. Block denoising with 
curvelets is more beneficial to images with high frequency content (e.g. 
anisotropic oscillating patterns in Barbara). For the other images, and ex- 
cept Peppers, block denoising with UDWT or curvelets are comparable 
(~ 0.2 dB difference). 

Note that the additional computational burden of block shrinkage compared 
to individual thresholding is limited: respectively 0.1s, Is and 0.7s for the 
DWT, UDWT and FDCT with 512 x 512 images, and less than 0.03s, 0.2s 
and 0.1 for 256 x 256 images. The algorithms were run under Matlab with an 
Intel Xeon 3GHz CPU, 8Gb RAM. 



Block vs BLS-GSM The described block denoising procedure has been 
compared to one of state-of-the-art denoising methods in the literature BLS- 



GSM [38|. BLS-GSM is a widely used reference in image denoising experi- 
ments reported in the literature. BLS-GSM uses a sophisticated prior model 
of the joint distribution within each block of coefficients, and then computes 
the Bayesian posterior conditional mean estimator by numerical integration. 
For fair comparison, BLS-GSM was also adapted and implemented with the 
curvelet transform. The two algorithms were applied to the same ten realiza- 
tions of additive white Gaussian noise with a in the same range as before. 
The output PSNR values averaged over the ten realizations for each of the six 
tested image are tabulated in Table 2. By inspection of this table, the per- 
formance of block denoising and BLS-GSM remain comparable whatever the 
transform and image. None of them outperforms the other for all transforms 
and all images. When comparing both algorithms for the DWT transform, 
the maximum difference between the corresponding PSNR values is 0.5 dB in 
favor of block shrinkage. For the UDWT and FDCT, the maximum difference 
is ~ 0.6 dB in BLS advantage. Visual inspection of Fig.5 and 6 is in agreement 
with the quantitative study we have just discussed. For each transform, differ- 
ences between the two denoisers are hardly visible. Our procedure is however 
much simpler to implement and has a much lower computational cost than 
BLS-GSM as can be seen from Table 1. Our algorithm can be up to 10 times 
faster than BLS-GSM while reaching comparable denoising performance. As 
stated in the previous paragraph, the bulk of computation in our algorithm is 
essentially invested in computing the forward and inverse transforms. 
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Barbara 512 x 512 



Lena 512 x 512 




House 256 x 256 



Boat 512 X 512 





15 20 25 30 35 40 45 50 



Fingerprint 512 x 512 



Peppers 256 x 256 




15 20 25 30 35 40 45 50 



Fig. 4. Block vs term-by-term thresholding. Each plot corresponds to PSNR im- 
provement over DWT term-bv-term thresholding as a function of a. 




(c) (d) 




Fig. 5. Visual comparison of our block denoising to BLS-GSM on Barbara 512 x 512. 
(a) original, (b) noisy a = 20. (c), (e) and (g) block denoising with respectively DWT 
(28.04 dB), UDWT (29.01 dB) and FDCT (30 dB). (d), (f) and (h) BLS-GSM with 
respectively DWT (28.6 dB), UDWT (29.3 dB) and FDCT (30.07 dB). 
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(c) (d) 




Fig. 6. Visual comparison of our block dcnoising to BLS-GSM on Lena 512 x 512. (a) 
original, (b) noisy a = 20. (c), (e) and (g) block denoising with respectively DWT 
(30.51 dB), UDWT (31.47 dB) and FDCT (31.48 dB). (d), (f) and (h) BLS-GSM 
with respectively DWT (30.62 dB), UDWT (32 dB) and FDCT (31.6 dB). 
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512 X 512 image 



256 X 256 image 





DWT 


UDWT 


FDCT 


Block 


0.22 


2.6 


5.8 


BLS-GSM 


3 


26 


30 





DWT 


UDWT 


FDCT 


Block 


0.045 


0.45 


1.2 


BLS-GSM 


1 


5.5 


6.6 



Table 1 

Execution times in seconds for 512x512 images and 256x256 images. The algorithms 
were run under Matlab with an Intel Xeon 3GHz CPU, 8Gb RAM. 



Barbara 512 x 512 Lena 512 X 512 



a 


5 


10 


15 


20 


25 


30 


50 




5 


10 


15 


20 


25 


30 


50 


PSNRin 


34.15 


28.13 


24.61 


22.11 


20.17 


18.59 


14.15 




34.15 


28.13 


24.61 


22.11 


20.17 


18.59 


14.15 


Block DWT 


36.81 


32.50 


30.07 


28.41 


27.16 


26.16 


23.74 




37.61 


34.05 


31.99 


30.62 


29.58 


28.71 


26.36 


BLS-GSM DWT 


36.87 


32.65 


30.26 


28.61 


27.40 


26.40 


23.90 




37.41 


33.97 


31.68 


30.62 


29.62 


28.70 


26.. 36 


Block UDWT 


:-!7.:-!7 


:«.21 


:-!(). 80 


29. 09 


27.77 


20.70 


21.01 




:-!8.02 


31.75 


32.85 


31. 18 


:-!(). 11 


29.5:-! 


27.16 


BLS-GSM UDWT 


37.44 


33.43 


31.06 


29.40 


28.16 


27.13 


24.49 




38.16 


35.15 


33.34 


32.02 


30.97 


30.13 


27.78 


Block FDCT 


37.57 


33.68 


31.52 


30.00 


28.83 


27.86 


25.38 




38.09 


34.78 


32.86 


31.45 


30.43 


29.55 


27.12 


BLS-GSM FDCT 


37.63 


33.82 


31.64 


30.08 


28.93 


28.01 


25.36 




38.10 


34.93 


33.03 


31.60 


30.53 


29.65 


27.02 






House 256 x 256 














Boat 512 X 512 








5 


10 


15 


20 


25 


30 


50 




5 


10 


15 


20 


25 


30 


50 


PSNRin 


34.15 


28.13 


24.61 


22.11 


20.17 


18.59 


14.15 




34.15 


28.13 


24.61 


22.11 


20.17 


18.59 


14.15 


Block DWT 


37.63 


33.47 


31.33 


29.86 


28.76 


27.79 


25.41 




36.41 


32.52 


30.41 


28.93 


27.81 


26.97 


24.83 


BLS-GSM DWT 


37.43 


33.97 


31.77 


29.88 


29.17 


28.43 


26.12 




36.06 


32.36 


30.36 


29.04 


27.35 


26.76 


24.86 


Block UDWT 


38.10 


34.31 


32.31 


30.86 


29.75 


28.80 


26.35 




36.89 


33.15 


31.11 


29.67 


28.59 


27.71 


25.45 


BLS-GSM UDWT 


38.17 


34.79 


32.95 


31.52 


30.41 


29.49 


27.00 




36.85 


33.46 


31.52 


30.14 


29.09 


28.22 


26.00 


Block FDCT 


38.35 


34.36 


32.04 


30.32 


29.70 


28.71 


25.90 




36.89 


33.07 


31.03 


29.65 


28.59 


27.70 


25.49 


BLS-GSM FDCT 


38.47 


34.69 


32.47 


30.92 


29.71 


28.72 


25.93 




36.74 


33.17 


31.20 


29.80 


28.77 


27.88 


25.52 




Fingerprint 512 X 512 












Peppers 256 X 256 






a 


5 


10 


15 


20 


25 


30 


50 




5 


10 


15 


20 


25 


30 


50 


PSNRin 


34.15 


28.13 


24.61 


22.11 


20.17 


18.59 


14.15 




34.15 


28.13 


24.61 


22.11 


20.17 


18.59 


14.15 


Block DWT 


35.74 


31.37 


29.10 


27.53 


26.33 


25.34 


22.84 




36.81 


32.56 


30.28 


28.64 


27.42 


26.42 


23.77 


BLS-GSM DWT 


:-!5. .""):-! 


:-!l.08 


2S.82 


27.08 


2().01 


25.11 


22.72 




:i().()9 


32.50 


3().:-!8 


28.90 


27.()."') 


2().70 


2:i.55 


Block UDWT 


36.22 


31.89 


29.62 


28.06 


26.87 


25.90 


23.37 




37.48 


33.60 


31.37 


29.74 


28.52 


27.52 


24.71 


BLS-GSM UDWT 


36.54 


32.23 


29.91 


28.36 


27.20 


26.30 


23.85 




37.59 


33.96 


31.78 


30.17 


28.99 


27.97 


25.16 


Block FDCT 


36.13 


31.98 


29.66 


28.03 


26.84 


25.92 


23.51 




37.09 


33.14 


30.86 


29.17 


28.01 


27.09 


24.38 


BLS-GSM FDCT 


36.34 


32.14 


29.82 


28.21 


27.05 


26.14 


23.70 




37.15 


33.32 


31.10 


29.44 


28.19 


26.85 


24.27 



Table 2 

Comparison of average PSNR over ten realizations of block denoising and BLS- 
GSM, with three transforms. 
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4-3 Reproducible research 



Following the philosophy of reproducible research, a toolbox is made available 
freely for download at the address 

http : / / www . greyc . ensicaen . f r/~ j f adili/ software . html 

This toolbox is a collection of Matlab functions, scripts and datasets for im- 
age block denoising. It requires at least WaveLab 8.02 [H| to run properly. 
The toolbox implements the proposed block denoising procedure with several 
transforms and contains all scripts to reproduce the figures and tables reported 
in this paper. 



5 Conclusion 



In this paper, an Stein block thresholding algorithm for denoising d-dimensional 
data is proposed with a particular focus on 2D image. Our block denoising is a 
generalization of one-dimensional BlockJS to d dimensions, with other trans- 
forms that orthogonal wavelets, and handles noise in the coefficient domain 
beyond the i.i.d. Gaussian case. Its minimax properties are investigated, and 
a fast and appealing algorithm is described. The practical performance of the 
designed denoiser were shown to be very promising with several transforms 
and a variety of test images. It turns out that the proposed block denoiser is 
much faster than state-of-the art competitors in the literature while reaching 
comparable denoising performance. 

We believe however that there is still room for improvement of our procedure. 
For instance, for d = 2, it would be interesting to investigate both theoretically 
and in practice how our results can be adapted to anisotropic blocks with 
possibly varying sizes. The rationale behind such a modification is to adapt 
the blocks to the geometry of the neighborhood. We expect that the analysis 
in this case, if possible, would be much more involved. Another interesting line 
of research would be to try to improve our convergence rates by relaxing the 
condition q < p A2. At this moment, given our definition of the smoothness 
space and our derivations in the proof (see appendix) , we have not found a way 
around it yet. As remarked in subsection 3.1.1, a parameter 6 was introduced, 
whose role becomes of interest when addressing linear inverse problems such 
as deconvolution. Extension of BlockJS to linear inverse problems remains also 
an open question. All these aspects need further investigation that we leave 
for a future work. 
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Appendix: Proofs 



In this section, C represents a positive constant which may differ from one 
term to another. We suppose that n is large enough. 



A Proof of Theorem 3.1 

We have the decomposition: 

R{e*,e) = Ri + R2 + R3, (A.l) 

where 

j'o-l J* 

^1 = E E E E - Oj,,^r) , = E E E E - OjA^r) , 

j=o eeBj keDj j=jo eeSj keOj 

oo 

-^3 = E EE ^le,k- 
Let us bound the terms Ri , R^ and R2 (by order of difficulty) . 
The upper bound for R^. It follows from (Al) that 



io-i io-i 

^1 = ^"'' E E E < Qin-^- E 2^-('^*+^)Card(S,) 
j=o eeBj keDj j=o 

io-i 

3=0 



< C'L(l/™''i<=i, -,dW)){<i*+'5+t')^-r < (7(^log^^(l/(<imini=i,...,d/ii))(ci«+(5+t; 
^^^-2sr/{2s+5+d.+v)_ ^^2) 

We used the inequality 2s/ (2s + (5 + + 1;) < 1 which implies that, for a large 
enough n, {\ognY^/'^''-™^^'^^^^' - ''i-'^^^^^'^*'^^'^'"^n^'^^/^^^^ < 1. 



The upper bound for R^. We distinguish the case q <2 < p and the case 
q<p<2. 

For 5 < 2 < p, we have ©;,^(M) C Q2,q{M) Q ©2,2(^)- Hence 
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(A.3) 

For g < p < 2, we have @1^{M) C 0^-''*/^+''*/'(M) C el;^''*/P+'^*/\M). We 
have 

s/(2s + 5 + 4 + t;) < (s - d^/p + 4/2)/(4 + 5 + 

s(4 + 5 + i;) < (s - + 4/2)(2s + 6 + d^ + v) 
^ 0<2s^- {d,/p - 4/2) (2s + 6 + d, + v) 

4^ < 2s(s - + s4 - (djp - dj2){S + d^ + v) . 

This impUcs that, if sp > d^ and s > {1/p — l/2){6 + rf* + v), we have 
s/{2s + 6 + d^ + v) <{s- dJp + dj2)/{d^ + 6 + v). Therefore, 



i?3 < M"^ 2~2j(s-(i*/p+d«/2) ^ ^2~2J*(s-(i*/pH-(i*/2) 

i=J*+l 

< (^^-2r(s-d./p+d./2)/(d*+5+u) ^ ^^-2sr/(2s+5+a!*+w)_ 

Putting (A.3) and (A.4) together, we obtain the desired upper bound. 

The upper bound for R2. We need the following result which will be proved 
later. 

Lemma A.l Let (fi)ieN* be a sequence of real numbers and (wi)ieN* be a 
sequence of random variables. Set, for any i , 

Then, for any m e N* and any A > 0, the sequence of estimates ('Ui)i=i,...,m 
defined by Ui — Ui(l — {J2iLi uj) satisfies 

m m / m \ 

g(* - v.f < lOg^FlKj:^^.^,)./.,^ + lOnJn (g.?, AV4] . 



Lemma A.l yields 

^2 = E E E EE ((%,,k - Oj,e,y.r) < 10{B, + B,l (A.5) 

3=30 e&Bj K&Aj kGUj,K 

where 

i^i = ^-^EE E E ^UiMhy >...wl 

j=jo eeBj KeA, kec/,,K V I^^^^^.k ^ J 
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and 

^2 = E E E niin f E ^l,.,k, Kl'^L'n-^/^ 

j=jo eeBj KeAj \ke[/j,K 

Using (A2), we bound Bi by 

Bi < Q2n-'' < Q2^-2W(25+5+<i*+^). (A.6) 
To bound i?2, we again distinguish the case q <2 < p and the case q < p < 2. 

For q < 2 < p, we have 9 e 0p,g(M) C @lg{M) C eyW. Let be the 
integer = [(r/ (2s + 5 + + z;)) log2 nj . We then obtain the bound 



S2<4-%L''n-'-E2^'Card(^,)Card(S,)+ E E E 
j=jo j=js+i ieBj keDj 

<4-'cAA-^t^'^'*^'^''^L-'+ E E E^l,..k 

J, 

< C'n"''2^'^('^* + E 2"^^'" 

Putting (A. 5), (A.6) and (A. 7) together, it follows immediately that 

R2 < Cn-2"'-/('*+^+'^*+"). (A.8) 

Let's now turn to the case q < p < 2. Let j* be the integer j* = |_(r/ {2s + S + 
+ v)) log2(n/ logn)J . We have 

B2<Di + D2 + D3, (A.9) 

where 

j* 

Di = A-^X^L'^n-'- E 2^*Card(^)Card(S^), 

j=jo 



and 

-^3 = E E E E ^j/.klfy- g2 <x,2ijLdn-r/4\' 



We have 



J=JO 



< C(logn/n)2*'-/(2"+'^+'^*+"). (A.IO) 
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Moreover, using the classical inequality \\9\\2 < < 2, we obtain 



^ p/2 

j=jl+i eeBj KeAj \kei7j,K 

<C{\ogn/nY^'-P/'^ E 2^^M2)5] E l^.'/.x^ (A.ll) 
j=j*+i eeBj keDj 

Since q < p, we have ©^^(M) C ©^ ^(M). Combining this with sp > and 
s > (1/p — 1/2) (5 + + v), we obtain 



D2<C {log n/ny^^-P/^'^ E 2^^(i-^'/2)2-J(-+'^*/2-<i*/f)p 

< C(log ^/n)''(^-J'/2)2--?"(^+''*/2-<^*/P-'5/P+'5/2)p 

< C(log n/n)^^^+''^^"P/^^^''/^^^+''+''*+''^ 

< C(logn/n)2^'-/('"+'^+'^*+"). (A. 12) 

We have, for any k G f/,,K, the inclusion jZ^kef/jK 
{|^j,£,k| < (A,2^-?L<^n-'-)V2/2}. Therefore, 

J, 

-^3- E EE E ^i,Akl{|e.^k|<(A*2«Ldn-'-)i/2/2| 

j=j*+i e&Bj KeA, kec/j,K 
<C(A.L'^n-'-)i-^/2 E 2^-^(1-^/2) E 

which is the same bound as for in (A.ll). Then using similar arguments 
as those used for in (A. 12), we arrive at 

L>3 < C (log n/n) ^'r/{2s+s+d. +v) _ ( A_ ;^3) 

Inserting (A.IO), (A.12) and (A.13) into (A.9), it follows that 

R2 < C(logn/n)2^'^/('^+*+''*+"). (A.14) 
Finally, bringing (A.l), (A.2), (A.3),(A.4), (A.8) and (A.14) together we obtain 
sup R{9*, 9)<Ri + R2 + R3< Cp^, 

where p„ is defined by (3.4). This ends the proof of Theorem 3.1. 
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B Proof of Lemma A.l 



We have 

^(M,-t;,)2 = max(A,5), (B.l) 



i=l 

where 



m > 
2 



1=1 \ \i=l 

Let's bound A and in turn. 



fli 



The upper bound for A. Using the elementary inequahty (a — 6)^ < 2(a^ + 
6^), we have 



ml / m \ 2\ 



We have the decomposition 

D^Di + L>2, (B.3) 

where 



We clearly have 



/ IIL \ III 



(B.4) 
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Using the Minkowski inequality, we have the inclusion ^^iJ^iLi^^)^^^ > A| fl 
{{J:T=i^lf" < A/2} C {{j:T=iv!f" > A/2}n{(EI^i^D'^' < A/2} . There- 



fore 



D,<2 J:^^ + X')i, 



"{(Er=i^^)'''>v2}n{(Er=i-i)'''<v2} 

<10mm[Y,Vi,X^/A . (B.5) 



/ m 



\i=l / 

If we combine (B.2), (B.3), (B.4) and (B.5), we obtain 



(B.6) 



The upper bound for B. We have the decomposition 

B = Gi + G2 (B.7) 

Using the Minkowski inequality, we have again the inclusion |(EI^i '"f)^''^ ^ A}n 

{imi^lf" > A/2} c {{j:T=ivff" < 3(Er=i^l)'^'}n{(Er=i«^D'^' > a/2}. 

It follows that 

m 

^^-§''''^{(E-.^^)^'^^3(E-.-l)^^^}n{(E-.'"l)^^^>V2} 

m 

-'§'"-^{(Er.»a->v.}- 

Another application of the Minkowski inequality leads to the inclusion |(Ei^i "^^i )^ 

{iET=i^ff" < A/2} c {{j:T=iv!f' < sx/2}n{{j:T=i^ff" < A/2}. It fol- 

lows that 

m 

^'-S'^^'^{(E-.^^)^^^<3A/2}n{(E-.-?)^^^<V2} 

<min(^f ^^9AV4j. (B.9) 
Therefore, if we combine (B.7), (B.8) and (B.9), we obtain 

m / m \ 

B < 9E^^{(^^_^^.)V.^,/,} +min (E^^9AV4J . (B.IO) 
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Putting (B.l), (B.6) and (B.IO) together, we have 



m 

Y^{ui - Vif = max (A, B) 

i=l 

Lemma A.l is proved. 



C Proof of Proposition 3.1 



First of all, notice that the Jensen inequahty, (A3) and the fact that Card(I?j) < 
2j<i* imply 



sup sup 2-^('^*+^) ^ E (4, J < sup 2-^('^-+^)sup 5: (E(4,,k)) 



1/2 



<QI^^ sup 2-^'^*Card(Dj) 



je{o,...,j} 



<Qr. 



Therefore (Al) is satisfied. 

Let's now turn to (A4). Again, the Jensen inequality yields 



EE E E 

3=30 t&Bj Ke^j kGC/j,K y 



r2 1 . 



<E E E E (eO 

Using (A3), it comes that 



1/2 









p 


(1 


E ^j/M 


V 







1/2 



1/2 



30 



J, 



E E E E (e 

<C2-^*('^*+^+")Q3/' sup sup sup 



1/2 



1/2 



E ■ 
/ / 



p 



V 



E 



/ 



< CrCQy^ sup sup sup 



1/2 



1/2 

> (A,2^-'L'^)^/V2 

1/2 



1/2 



P E ^Ak > (A.2^^L'^)VV2 



V 



(C.l) 



To bound the probability term, we introduce the Borell inequahty. For further 
details about this inequality, see, for instance, Q]- 

Lemma C.l (The Borell inequality) Let V he a subset of M. and {rjtjt&v 
he a centered Gaussian process. Suppose that 



E sup rjt \ < N and sup ¥(77*) < Z. 



Then, for any x > 0, we have 



¥{sup7]t>x + N] < exp(-xV(2^)). 



Let us consider the set ^2 defined by S2 = {a = (a^) G M*; Zlkei/j k '^k — I}' 
and the centered Gaussian process 2 (a) defined by 

^(ct) = E '^k%,£,k- 

We have by the Cauchy-Schwartz inequality 



1/2 



sup Z{a) = sup «k^i,^,k = E 4,^,k 

a£S2 "^"^akeC/oK \ke(7,,K 



In order to use Lemma C.l, we have to investigate the upper bounds for 
E(sup„g5^ Z{a)) and sup^g^^ Y{Z{a)). 

The upper bound for E(sup^g52 ^i^))- '^^^ Jensen inequality and (A3) 
imply that 
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E supZ(a) =E 
\aeS2 J 



1/2N 



1/2 



v 



E 



<i E {H4i,^ 



1/2 



1/2 



So, 7V = Q3/'2'5j/2(logn)V2. 

The upper bound for sup^^^^ '^(^('^))- -^y assumption, for any j G N and 
k G -Dj, we have K^Zj^g^i^) = 0. The assumption (A4) yields 



sup V {Z{a)) = sup E 



( E «k-2,,Ak 1 I < Qa2''. 



It is then sufficient to take Z = Q^^K 

Combining the obtained expressions of N and Z with Lemma C.l, for any 
2 £ {jo, K G Aj and k G C/^-^k, we have 



1/2 



> (A,2^^L'^)^/V2 



kkec/j-K 



1/2 >^ 
= P I I E 4^,k I > (AyV2 - )(2^^L'')V^ + (2^^L'^)V2 

^ke!7^,K 



= P [sup Z{a) > (AyV2 - g3/^)(2'^^L'^)i/2 + n] 

<exp (-(AyV2 - g^/')22^^LV(2Z)) < ^-(AyV2-Qr)V(2Q4)_ 

Since A* = 4 ((2g4)V2 + qI^^Y , it foUows that 



P 



1/2 



E 



n 



(C.2) 



Putting (C.l) and (C.2) together, we have proved (A2). This ends the proof 
of Proposition 3.1. 
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D Proof of Proposition 3.2 



The proof of this proposition is similar to the one of Theorem 3.1. The only 
difference is that, instead of using Lemma A.l, we use Lemma D.l below. 

Lemma D.l (||9|]) Let{vi)i^^* be a sequence of real numbers, {wi)i^n* bei.i.d. 
A/'(0, 1) and a G M*. Set, for any i G N*, 

Ui = Vi + awi. 

Then, for any m E W and any 7 > 1, the sequence of estimates ('Uj)j=i,...,m 
defined by Ui = Ui{l — 'yma'^ (SI^i ^f) satisfies 

1=1 / \j=i 

To clarify, if the variables (2;j/,k)j,£,k are i.i.d. A/'(0, 1) then Lemma D.l im- 
proves the bound of the term Bi appearing in the proof of Theorem 3.1. 

If we analyze the proof of Theorem 3.1 and we use Lemma A.l instead of 
Lemma D.l, we see that it is enough to determine A,, such that there exists a 
constant Q5 > satisfying 

Card(5,)Card(^)e-(^'/')(^*-'°sA.-i) < 

3=30 

(It corresponds to the bound of the term Bi that appears in (A. 5)). If A* is 
the root of x — logx = 3, it comes that 

X: Card(i?,)Card(^)e-(^'/2)(A,-iogA,-i)^^^g-(LV2)(A.-iogA.-i)2J,K+-) 

j=30 

<^g-(LV2)(A.-logA,-l)^r<g^_ 

Proposition 3.2 is proved. 
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